Goto

Collaborating Authors

 lemma 24


The Plug-in Approach for Average-Reward and Discounted MDPs: Optimal Sample Complexity Analysis

Zurek, Matthew, Chen, Yudong

arXiv.org Machine Learning

We study the sample complexity of the plug-in approach for learning $\varepsilon$-optimal policies in average-reward Markov decision processes (MDPs) with a generative model. The plug-in approach constructs a model estimate then computes an average-reward optimal policy in the estimated model. Despite representing arguably the simplest algorithm for this problem, the plug-in approach has never been theoretically analyzed. Unlike the more well-studied discounted MDP reduction method, the plug-in approach requires no prior problem information or parameter tuning. Our results fill this gap and address the limitations of prior approaches, as we show that the plug-in approach is optimal in several well-studied settings without using prior knowledge. Specifically it achieves the optimal diameter- and mixing-based sample complexities of $\widetilde{O}\left(SA \frac{D}{\varepsilon^2}\right)$ and $\widetilde{O}\left(SA \frac{\tau_{\mathrm{unif}}}{\varepsilon^2}\right)$, respectively, without knowledge of the diameter $D$ or uniform mixing time $\tau_{\mathrm{unif}}$. We also obtain span-based bounds for the plug-in approach, and complement them with algorithm-specific lower bounds suggesting that they are unimprovable. Our results require novel techniques for analyzing long-horizon problems which may be broadly useful and which also improve results for the discounted plug-in approach, removing effective-horizon-related sample size restrictions and obtaining the first optimal complexity bounds for the full range of sample sizes without reward perturbation.


Nonlinear spiked covariance matrices and signal propagation in deep neural networks

Wang, Zhichao, Wu, Denny, Fan, Zhou

arXiv.org Artificial Intelligence

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK) defined by the nonlinear feature map of a feedforward neural network. However, existing results only establish weak convergence of the empirical eigenvalue distribution, and fall short of providing precise quantitative characterizations of the ''spike'' eigenvalues and eigenvectors that often capture the low-dimensional signal structure of the learning problem. In this work, we characterize these signal eigenvalues and eigenvectors for a nonlinear version of the spiked covariance model, including the CK as a special case. Using this general result, we give a quantitative description of how spiked eigenstructure in the input data propagates through the hidden layers of a neural network with random weights. As a second application, we study a simple regime of representation learning where the weight matrix develops a rank-one signal component over training and characterize the alignment of the target function with the spike eigenvector of the CK on test data.


Expressive Power of ReLU and Step Networks under Floating-Point Operations

Park, Yeachan, Hwang, Geonho, Lee, Wonyeol, Park, Sejun

arXiv.org Artificial Intelligence

The study of the expressive power of neural networks has investigated the fundamental limits of neural networks. Most existing results assume real-valued inputs and parameters as well as exact operations during the evaluation of neural networks. However, neural networks are typically executed on computers that can only represent a tiny subset of the reals and apply inexact operations. In this work, we analyze the expressive power of neural networks under a more realistic setup: when we use floating-point numbers and operations. Our first set of results assumes floating-point operations where the significand of a float is represented by finite bits but its exponent can take any integer value. Under this setup, we show that neural networks using a binary threshold unit or ReLU can memorize any finite input/output pairs and can approximate any continuous function within a small error. We also show similar results on memorization and universal approximation when floating-point operations use finite bits for both significand and exponent; these results are applicable to many popular floating-point formats such as those defined in the IEEE 754 standard (e.g., 32-bit single-precision format) and bfloat16.


A Finer Calibration Analysis for Adversarial Robustness

Awasthi, Pranjal, Mao, Anqi, Mohri, Mehryar, Zhong, Yutao

arXiv.org Machine Learning

W e present a more general analysis of H -calibration for adversarially robust classification. By adopting a finer definition of calibration, we can cover setti ngs beyond the restricted hypothesis sets studied in previous work. In particular, our results ho ld for most common hypothesis sets used in machine learning. W e both fix some previous calibration re sults ( Bao et al., 2020) and generalize others ( A wasthi et al., 2021). Moreover, our calibration results, combined with the pre vious study of consistency by A wasthi et al. ( 2021), also lead to more general H -consistency results covering common hypothesis sets.